Locating Discontinuities in Synthetic Speech using a Perceptually Orientated Approach
نویسندگان
چکیده
A significant problem with unit selection based speech synthesis is the listener perception of sound discontinuities at which the speech waveforms are joined. This work demonstrates the application of three different perceptually motivated timefrequency representations and associated measures to the identification of such discontinuities.
منابع مشابه
Perceptually-based Data-driven Join Co
Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...
متن کاملPerceptually-based data-driven join costs: comparing join types
Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...
متن کاملFeature transformation applied to the detection of discontinuities in concatenated speech
The quality of concatenated speech depends on the degree of mismatch between successive units. Defining a perceptually salient join cost to represent the degree of mismatch has proven to be a difficult task. Such a join cost is critical in unit selection synthesis to ensure that the optimum sequence of speech units is selected from the units available in the speech inventory. In this study the ...
متن کاملAutomatic Segmentation Combining and Spectral Boundary
Currently, AT&T Labs’ Natural Voices multilingual TTS system produces high-quality synthetic speech with a largescale speech corpus [1]. In the development of such systems, automatic segmentation constitutes a major component technology. The prevalent approach for automatic segmentation in speech synthesis is Hidden Markov Model (HMM) based. Even though an HMM-based approach is the most automat...
متن کاملListeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis
The quality of current commercial speech synthesis systems is now so high that system improvements are being made at subtle suband supra-segmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of perceptual attention to specific acoustic characteristics or cues. However, it is not well understood what acoustic cues listeners attend to by d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009